Canonicalize volatile system prompt headers in OpenAI paths by Thump604 · Pull Request #528 · waybarrios/vllm-mlx

Thump604 · 2026-05-12T01:06:34Z

Refs #524.

Summary

Add a small static system-prompt canonicalization helper with the currently validated x-anthropic-billing-header stripper.
Apply it to prepared system-role messages in Chat Completions and Responses paths before engine execution.
Cover the helper, Chat Completions preparation, Responses preparation, and existing Anthropic adapter behavior.

Local repro / observed behavior

On current waybarrios/vllm-mlx main (f068991 when this branch was cut), the Anthropic Messages adapter removes x-anthropic-billing-header: from request.system in vllm_mlx/api/anthropic_adapter.py::anthropic_to_openai, but the OpenAI server paths did not apply the same canonicalization:

vllm_mlx/server.py::_prepare_chat_messages
vllm_mlx/server.py::_prepare_responses_request

A Chat Completions system message or Responses instructions value containing:

x-anthropic-billing-header: account=abc; cch=rotating-hash

was still present in the prepared system message sent to the engine.

Expected behavior

The validated billing-header line is non-semantic request metadata and should be removed from system-role text before engine execution. User-role content with the same text is preserved, and user-visible timestamp text is not stripped.

Minimal patch shape

New vllm_mlx/api/prompt_canonicalize.py module with a static stripper list and canonicalize_system_prompt().
New canonicalize_system_messages() helper that copies only changed system messages.
Call the helper after existing message normalization in Chat Completions and after Responses input conversion.
Do not add runtime registration APIs or speculative strippers.

Explicitly not claimed

This does not add timestamp, MCP UUID, or session-ID strippers.
This does not change SimpleEngine system-prefix KV-cache logic from feat: extend system-prompt KV cache to pure-LLM stream_chat path #523.
This does not include new TTFT or cache-hit-rate benchmark results.
This does not change media extraction, tool parsing, sampling, or decode controls.

Verification

AI_RUNTIME_BYPASS_SAFETY_GATE=1 PYTHONPATH=/opt/ai-runtime/worktrees/vllm-mlx/issue-524-prompt-canonicalization /opt/ai-runtime/venv-live/bin/python -m pytest tests/test_prompt_canonicalize.py tests/test_responses_api.py tests/test_anthropic_adapter.py tests/test_server.py::TestPromptCanonicalization -q
# 74 passed

uvx ruff check vllm_mlx/api/prompt_canonicalize.py vllm_mlx/server.py tests/test_prompt_canonicalize.py tests/test_server.py tests/test_responses_api.py
# All checks passed

/opt/ai-runtime/venv-live/bin/python -m black --check --target-version py312 vllm_mlx/api/prompt_canonicalize.py vllm_mlx/server.py tests/test_prompt_canonicalize.py tests/test_server.py tests/test_responses_api.py
# 5 files would be left unchanged

git diff --check
# clean

janhilgard

Clean implementation. The prompt_canonicalize.py module is well-isolated with an extensible _STRIPPERS tuple, and canonicalize_system_messages() correctly avoids mutation (copies only changed messages, returns the original list when nothing changes).

Review

Regex (?im)^x-anthropic-billing-header:[^\n]*(?:\n|$) — multiline + case-insensitive + anchored. Correctly strips the full line including trailing newline.
Only targets system role — user content is preserved (explicitly verified by the chat completion test).
None passthrough is correct.
Placement in server.py is right: after _normalize_messages() but before media extraction.
Tests cover the unit helper (strip, idempotent, None/empty), Chat Completions path, and Responses path.

Minor note

The Anthropic adapter in anthropic_adapter.py has its own inline regex (re.sub(r"x-anthropic-billing-header:[^\n]*\n?", "", system_text)) without (?im) flags or ^ anchor. Not blocking, but a future cleanup could have the adapter call canonicalize_system_prompt() instead of duplicating the pattern.

CI 9/9. LGTM.

Add system prompt canonicalization for OpenAI paths

0f9efd8

Thump604 assigned janhilgard May 12, 2026

Thump604 requested a review from janhilgard May 12, 2026 01:06

This was referenced May 12, 2026

tokens_to_prefill always matches prompt_tokens, and caching does not seem to work when using claude code #522

Open

Prefix cache: generalize prompt canonicalization beyond Anthropic billing header #524

Open

janhilgard approved these changes May 16, 2026

View reviewed changes

Thump604 requested a review from waybarrios May 16, 2026 19:58

Thump604 assigned waybarrios May 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Canonicalize volatile system prompt headers in OpenAI paths#528

Canonicalize volatile system prompt headers in OpenAI paths#528
Thump604 wants to merge 1 commit into
waybarrios:mainfrom
Thump604:604/issue-524-prompt-canonicalization

Thump604 commented May 12, 2026

Uh oh!

janhilgard left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Thump604 commented May 12, 2026

Summary

Local repro / observed behavior

Expected behavior

Minimal patch shape

Explicitly not claimed

Verification

Uh oh!

janhilgard left a comment

Choose a reason for hiding this comment

Review

Minor note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants